Overview

This document details the analyses of the free classification task given to a total of 194 participants divided into 5 groups. The groups included a total English monolinguals (n = 62), East Asian multilingual speakers (n = 26), Southeast Asian multilingual speakers (n = 28), South Asian speakers (n = 24), and multilingual speakers of other languages (n = 59).

Statistical Analysis

Several statistical analyses were carried out to determine whether language background has an impact on the processing of accented English. First, descriptive statistics are reported which show the total number of categories created by each of the 5 groups given the same 45 speakers. Next, to determine how appropriate or correct these categories were, 3 error rates were calculated for each group: 2 category creation, 5 category creation and 15 category creation. 2 category creation error rate measured how often participants inappropriately categorized an Asian language category with an English language category and vice versa. 5 category error rate measured how often participants inappropriately categorized any of the 5 categories (English, International English, East Asian, South Asian or Southeast Asian) with another category. Finally, 15 category error rate measured how often the participants inappropriately categorized any of the 15 language categories with a different language. Single categories also counted as an error, since the minimum category number would be 3 (of the same language). Thus, the maximum number of errors was 45, and would occur if a participant created 45 single categories. The error rate in each case was the total number of errors divided by the maximum number of errors (45). The overall category was whichever of the categories occurred the most frequently within a given category. In the event that there were an equal number of categories to determine overall category, this did not impact the total number of errors and the overall category label was arbitrary. For example, if 4 languages were groups in a category, and 2 of them were American English and the other 2 were East Asian, this would be counted as two errors regardless of the overall category label.

In addition to the total number of categories, the closeness of those categories could provide evidence of between-group differences. This analysis make use of multidimensional scaling analysis (MDS) in which a (dis)similarity matrix allows for the visualization of categorization differences between groups to be examined in a two-dimensional space. This analysis has been used in previous papers, rather than Hierarchical clustering (HC) analysis, which assumes that the horizontal category differences/distances are equal. In other words, MDS allows for a more fine-grained analysis of the differences than HC, since it measures differences in two dimensions, rather than one (height). The MDS was used to assign an x and y coordinate to each speaker (n = 45) by each group (n = 5). An ellipse around the points for each was created using the stat_ellipse function in ggplot2 (cite) for the purpose of visualization based on the method put forward by Fox and Weisberg (2011). A centroid was also calculated for each categorization for each group by averaging the x and y coordinates. Distance between centroids by a particular group was measured to provide evidence of how distinct groups distinguished categories. As a measure of within-category tightness, the euclidian distance of each point was calculated from each individual point to its centroid. The figure below shows the MDS for each group. The color of points were assigned based on the actual category of the language, which was unknown to the participant. Here, a smaller ellipse/points being closer together suggests that a particular language group is being categorized as more similarly.

Finally, a Bayesian multilevel regression model was run to determine whether there were differences between groups and conditions in terms of how concentrated a given category was. The outcome variable of the model was the euclidian distance of a point from its centroid, where a smaller euclidian distance would suggest an overall tighter/more consistent category. The fixed effect predictors included group (5 levels, English monolingual, East Asian, South Asian, Southeast Asian, and Non-asian Multilingual) and language group (5 levels, American English, International English, East Asian, Southeast Asian, East Asian), with a random intercept for individual language. The model included the default brms priors - Student’s T distribution with 3 degrees of freedom. The model was run using with 4000 iterations of Hamiltonian Monte-Carlo sampling (1000 warm up), across 4 chains and 8 processing cores.

Results

Total Categories created

Total categories were calculated by the unique groups made by each participant. Mean total categories were calculated for each group. The figure below shows these averages, with the standard deviation in parentheses.

Error Rates

The figure below shows the error rates by each group in the 2, 5 and 15 error categories. The tables also show 2, 5 and 15 error rates in writing and correspond to the same values in the figure.

Error rates by each group

Error rates by each group

2 category error rate per group
group Error rate (sd)
East Asian 0.031 (0.035)
English monolingual 0.023 (0.043)
non_multi 0.052 (0.054)
South Asian 0.016 (0.02)
South-east Asian 0.034 (0.04)
5 category error rate per group
group Error rate (sd)
East Asian 0.213 (0.077)
English monolingual 0.189 (0.091)
non_multi 0.303 (0.127)
South Asian 0.171 (0.062)
South-east Asian 0.23 (0.136)
15 category error rate per group
group Error rate (sd)
East Asian 0.551 (0.1)
English monolingual 0.52 (0.092)
non_multi 0.624 (0.101)
South Asian 0.563 (0.123)
South-east Asian 0.566 (0.114)

MDS

The Bayesian Model

The results of the model are shown below, where the conditional_effects function was used to plot the estimate of each group for each language. Additional model details, including a forest plot, model table, and detailed model significance table are included in a section here called “appendix”.

Appendix Items

Dump Category

This boxplot shows the the number of members in the category of each participant that contained the most members, where the mean and standard deviation of the number of members in the largest group per group is included on the right side of the plot. Interestingly, the English monolinguals had the least number of speakers on average in their largest group, while also creating the most groups on average.

The following items show the full Bayesian model in a forest plot, a table showing the parameter estimates and a more detailed table including information related to the probability of an effect being positive/negative (probability of direction; pd), the probability of a “significant” (non-zero) effect (ps), and the Highest Density interval (including the mean, median and upper and lower bounds of 95% of the most probable parameter estimates).

  dist from center
Predictors Estimates CI (95%)
Intercept 1.53 0.18 – 2.85
groupEnglishmono 0.18 -1.57 – 1.92
groupNonmulti 2.30 0.51 – 4.08
groupSouthAsian 0.03 -1.70 – 1.83
groupSoutheastAsian 0.07 -1.73 – 1.86
lang_2East 2.43 0.56 – 4.26
lang_2International -0.75 -2.63 – 1.10
lang_2South 2.78 0.83 – 4.64
lang_2SouthEast 3.28 1.42 – 5.17
Random Effects
σ2 3.51
τ00 lang_3 0.09
ICC 0.02
N lang_3 15
Observations 225
Marginal R2 / Conditional R2 0.442 / 0.449
Parameter Component Median Mean MAP CI CI_low CI_high pd ps Rhat ESS
b_Intercept conditional 1.531 1.516 1.664 0.95 0.214 2.876 0.988 0.971 1.006 991.531
b_groupEnglishmono conditional 0.177 0.178 0.226 0.95 -1.500 1.968 0.581 0.474 1.008 1141.779
b_groupNonmulti conditional 2.302 2.293 2.392 0.95 0.544 4.104 0.992 0.987 1.005 1069.277
b_groupSouthAsian conditional 0.028 0.050 -0.075 0.95 -1.594 1.894 0.516 0.414 1.005 1259.040
b_groupSoutheastAsian conditional 0.067 0.047 0.068 0.95 -1.627 1.944 0.523 0.425 1.005 1372.438
b_lang_2East conditional 2.432 2.433 2.422 0.95 0.559 4.255 0.992 0.988 1.006 1231.306
b_lang_2International conditional -0.755 -0.767 -0.690 0.95 -2.680 1.036 0.793 0.713 1.004 1292.729
b_lang_2South conditional 2.780 2.767 2.837 0.95 0.805 4.608 0.998 0.997 1.004 1306.428
b_lang_2SouthEast conditional 3.280 3.283 3.423 0.95 1.550 5.264 1.000 0.999 1.006 1193.188
b_groupEnglishmono:lang_2East conditional -0.605 -0.595 -0.756 0.95 -3.027 1.843 0.679 0.609 1.005 1437.754
b_groupNonmulti:lang_2East conditional -2.178 -2.154 -2.253 0.95 -4.522 0.520 0.946 0.926 1.004 1363.266
b_groupSouthAsian:lang_2East conditional -1.395 -1.419 -1.232 0.95 -3.798 1.091 0.871 0.832 1.004 1569.788
b_groupSoutheastAsian:lang_2East conditional 0.155 0.170 0.016 0.95 -2.370 2.675 0.544 0.474 1.006 1646.251
b_groupEnglishmono:lang_2International conditional 0.567 0.573 0.558 0.95 -1.780 3.032 0.678 0.606 1.004 1549.684
b_groupNonmulti:lang_2International conditional 0.354 0.346 0.587 0.95 -2.128 2.798 0.609 0.536 1.003 1466.576
b_groupSouthAsian:lang_2International conditional 2.643 2.636 2.582 0.95 0.059 4.941 0.982 0.971 1.003 1657.478
b_groupSoutheastAsian:lang_2International conditional 3.485 3.504 3.517 0.95 1.001 5.976 0.997 0.994 1.002 1746.711
b_groupEnglishmono:lang_2South conditional -1.718 -1.719 -1.564 0.95 -4.205 0.848 0.911 0.879 1.004 1547.022
b_groupNonmulti:lang_2South conditional -1.832 -1.822 -1.926 0.95 -4.313 0.635 0.922 0.889 1.004 1516.314
b_groupSouthAsian:lang_2South conditional -2.953 -2.942 -3.100 0.95 -5.514 -0.439 0.987 0.979 1.003 1758.951
b_groupSoutheastAsian:lang_2South conditional 0.566 0.573 0.453 0.95 -1.883 3.039 0.669 0.594 1.003 1676.928
b_groupEnglishmono:lang_2SouthEast conditional 2.922 2.901 3.021 0.95 0.296 5.269 0.989 0.980 1.007 1384.854
b_groupNonmulti:lang_2SouthEast conditional -2.969 -2.967 -2.916 0.95 -5.481 -0.545 0.992 0.985 1.007 1264.185
b_groupSouthAsian:lang_2SouthEast conditional -0.548 -0.540 -0.502 0.95 -3.062 1.900 0.666 0.596 1.004 1452.721
b_groupSoutheastAsian:lang_2SouthEast conditional -1.407 -1.401 -1.474 0.95 -3.919 1.097 0.865 0.819 1.005 1640.445
sigma sigma 1.868 1.873 1.857 0.95 1.701 2.062 1.000 1.000 1.000 3495.151

Updates:

Here are the new plots:

Individual plots of each parameter estimate from the model with numerical values - these show the same estimates from the complete conditional effects plot, but are re-organized and presented per group, along with the (numerical) parameter estimates themselves with the 95% HDI.

Additive Trees

English monolingual Additive Tree

South Asian Additive Tree

East Asian Additive Tree

Southeast Asian Additive Tree

Non-asian multilingual Additive Tree